Dear Hiring Manager,

My name is Andy Heusser and I'm applying for the Data Scientist-Product position at Spotify. It would be an absolute dream of mine to analyze music and music-related data for a living. As a musician, data scientist and long-time Spotify enthusiast, I'm confident this position would be a great fit for me. In this letter and in the notebook below, I hope to express how my training in neuroscience, computational modeling, machine learning and software development will allow me to bring a unique and fruitful perspective to the data science team.

Prior to starting graduate school, I worked as a research scientist studying how the human brain creates and stores memories. While I was contemplating whether or not to go to graduate school, I stumbled across a small company called the Echo Nest, and was blown away by how cool it was. Echo Nest is a perfect marriage of two of my life passions: music and data science*. At the time, I was underqualified to do anything useful for the company, so instead of applying for a position I stuck with my plan A: go to graduate school and become a data ninja.

As a PhD student at NYU, my work focused on how brain rhythms (i.e. neural oscillations) relate to forming memories for sequences of events. These explorations culminated into a recent paper that was published in Nature Neuroscience this past fall, and a few other papers currently in the final stages. During this time, I gained a great deal of experience in signal processing, such as spectral decomposition and phase-based analyses, along with statistics and machine learning.

In my current position as a Postdoctoral Fellow at Dartmouth College, my research is focused on personalizing education, both online and in the classroom. Using machine vision, speech recognition and language modeling, my approach is to create models of the content of educational videos, as well as models of what you (a hypothetical student) know about the world. Linking these two representations allows us to adaptively present new educational videos that maximize learning for an individual.

In addition to research, I've developed a number of open-source software packages, including a Python toolbox for visualizing and manipulating high dimensional data, and several other Python/Javascript projects (learning analysis software, bayesian knowledge tracing visualization, academic article tracker, stimulus presentation software). Furthermore, I worked for an awesome company that creates open-source EEG hardware called OpenBCI, which gave me some great experience translating data-driven insights from the science world to the business world. When I'm not furiously coding, you can find me backpacking or playing music.

After many, many years of training, I feel that I have finally reached data ninjahood, and am prepared to take on my biggest challenge yet: extracting meaning from a database of every single relevant song that has ever been written....<deep breath>. Below, please find a short data expedition that highlights my analysis chops and unique approach to time-series analysis.

Sincerely,

Andy Heusser

*As it turns out, Spotify acquired the Echo Nest, which I think was an excellent idea.

Exploring the feature space of Daft Punk and Fleet Foxes music

Below is a short data exploration into the structure of music Daft Punk and Fleet Foxes, as captured by the audio_analysis feature of the Spotify API. My basic approach will be to extract song features such as beats, bars, pitch and timbre, do some munging to combine them into a single samples by features matrix and then visualize the temporal structure of the songs.

Import libraries

In [1]:
import pandas as pd
import hypertools as hyp
import seaborn as sns
import spotipy
import numpy as np
from spotipy.oauth2 import SpotifyClientCredentials
from sklearn import svm
from sklearn.model_selection import cross_val_score
import IPython


client_credentials_manager = SpotifyClientCredentials()
sp = spotipy.Spotify(client_credentials_manager=client_credentials_manager)

sns.set_context('talk')
%matplotlib inline

Search for a track

In [2]:
track = sp.search('Daft Punk Get Lucky', limit=1)

Get the URI

In [3]:
uri = track['tracks']['items'][0]['uri']

Quick listen!

In [4]:
url = "https://open.spotify.com/embed/track/" + uri.split(':')[-1]
iframe = '<iframe src=' + url + ' width=700 height=350></iframe>'
IPython.display.HTML(iframe)
Out[4]:

Analysis

In [5]:
analysis = sp.audio_analysis(uri)
analysis.keys()
Out[5]:
[u'bars', u'track', u'segments', u'beats', u'meta', u'sections', u'tatums']

Resample beats

In [6]:
beats = pd.DataFrame(analysis['beats'])
beats.index = pd.to_datetime(beats['start'], unit='s')
beats['beat'] = 1
beats = beats.resample('1L').mean()
beats = beats['beat'].fillna(0)
beats.head()
Out[6]:
start
1970-01-01 00:00:00.164    1.0
1970-01-01 00:00:00.165    0.0
1970-01-01 00:00:00.166    0.0
1970-01-01 00:00:00.167    0.0
1970-01-01 00:00:00.168    0.0
Freq: L, Name: beat, dtype: float64

Resample bars

In [7]:
bars = pd.DataFrame(analysis['bars'])
bars.index = pd.to_datetime(bars['start'], unit='s')
bars['bar'] = range(1,bars.shape[0]+1)
bars = bars.resample('1L').mean()
bars = bars['bar'].fillna(method='ffill')
bars.head()
Out[7]:
start
1970-01-01 00:00:01.208    1.0
1970-01-01 00:00:01.209    1.0
1970-01-01 00:00:01.210    1.0
1970-01-01 00:00:01.211    1.0
1970-01-01 00:00:01.212    1.0
Freq: L, Name: bar, dtype: float64

Resample pitches

In [8]:
segments = pd.DataFrame(analysis['segments'])
pitches = pd.DataFrame(segments['pitches'].as_matrix().tolist())
pitches.index = pd.to_datetime(segments['start'], unit='s')
pitches = pitches.resample('1L').mean().fillna(method='ffill')
pitches.head()
Out[8]:
0 1 2 3 4 5 6 7 8 9 10 11
start
1970-01-01 00:00:00.000 0.503 0.496 0.49 0.511 0.518 0.646 1.0 0.703 0.225 0.423 0.711 0.601
1970-01-01 00:00:00.001 0.503 0.496 0.49 0.511 0.518 0.646 1.0 0.703 0.225 0.423 0.711 0.601
1970-01-01 00:00:00.002 0.503 0.496 0.49 0.511 0.518 0.646 1.0 0.703 0.225 0.423 0.711 0.601
1970-01-01 00:00:00.003 0.503 0.496 0.49 0.511 0.518 0.646 1.0 0.703 0.225 0.423 0.711 0.601
1970-01-01 00:00:00.004 0.503 0.496 0.49 0.511 0.518 0.646 1.0 0.703 0.225 0.423 0.711 0.601

Resample timbre

In [9]:
segments = pd.DataFrame(analysis['segments'])
timbre = pd.DataFrame(segments['timbre'].as_matrix().tolist())
timbre.index = pd.to_datetime(segments['start'], unit='s')
timbre = timbre.resample('1L').mean().fillna(method='ffill')
timbre.head()
Out[9]:
0 1 2 3 4 5 6 7 8 9 10 11
start
1970-01-01 00:00:00.000 15.327 108.968 107.774 -373.559 34.593 -33.302 80.025 -121.103 -41.559 62.998 57.512 0.433
1970-01-01 00:00:00.001 15.327 108.968 107.774 -373.559 34.593 -33.302 80.025 -121.103 -41.559 62.998 57.512 0.433
1970-01-01 00:00:00.002 15.327 108.968 107.774 -373.559 34.593 -33.302 80.025 -121.103 -41.559 62.998 57.512 0.433
1970-01-01 00:00:00.003 15.327 108.968 107.774 -373.559 34.593 -33.302 80.025 -121.103 -41.559 62.998 57.512 0.433
1970-01-01 00:00:00.004 15.327 108.968 107.774 -373.559 34.593 -33.302 80.025 -121.103 -41.559 62.998 57.512 0.433
In [10]:
data = pd.concat([bars, beats, pitches, timbre], axis=1).fillna(0)
data.head()
Out[10]:
bar beat 0 1 2 3 4 5 6 7 ... 2 3 4 5 6 7 8 9 10 11
start
1970-01-01 00:00:00.000 0.0 0.0 0.503 0.496 0.49 0.511 0.518 0.646 1.0 0.703 ... 107.774 -373.559 34.593 -33.302 80.025 -121.103 -41.559 62.998 57.512 0.433
1970-01-01 00:00:00.001 0.0 0.0 0.503 0.496 0.49 0.511 0.518 0.646 1.0 0.703 ... 107.774 -373.559 34.593 -33.302 80.025 -121.103 -41.559 62.998 57.512 0.433
1970-01-01 00:00:00.002 0.0 0.0 0.503 0.496 0.49 0.511 0.518 0.646 1.0 0.703 ... 107.774 -373.559 34.593 -33.302 80.025 -121.103 -41.559 62.998 57.512 0.433
1970-01-01 00:00:00.003 0.0 0.0 0.503 0.496 0.49 0.511 0.518 0.646 1.0 0.703 ... 107.774 -373.559 34.593 -33.302 80.025 -121.103 -41.559 62.998 57.512 0.433
1970-01-01 00:00:00.004 0.0 0.0 0.503 0.496 0.49 0.511 0.518 0.646 1.0 0.703 ... 107.774 -373.559 34.593 -33.302 80.025 -121.103 -41.559 62.998 57.512 0.433

5 rows × 26 columns

Now that the song features are in a matrix representation, we can start to visually explore the model. Below is a heatmap where time samples are plotted on the y axis and features are on the x-axis.

Plot the model

In [11]:
sns.heatmap(data, yticklabels=False, xticklabels=False, vmin=-2, vmax=2)
sns.set_context('talk')
sns.plt.show()

It appears that there are some scaling differences between the columns of the features. To deal with this, I'll normalize the columns using some software I wrote to analyze and visualize high-dimensional data called Hypertools.

Normalize columns and replot

In [12]:
sns.heatmap(hyp.tools.normalize(data, normalize='within'), yticklabels=False, xticklabels=False, vmin=-2, vmax=2)
sns.set_context('talk')
sns.plt.show()

Ah, that's better. Now we can take a look at the temporal structure of the song by computing a timepoint by timepoint correlation matrix. This matrix represents the correlation of each timepoint with every other timepoint, allowing you to see what parts of the song are more similar to each other, and what parts are more distinct.

Plot the timepoint by timepoint correlation matrix

In [13]:
sns.heatmap(data.resample('1S').mean().T.corr(), yticklabels=False, xticklabels=False)
sns.set_context('talk')
sns.plt.show()

There is definitely some interesting structure in there! Overall, it appears that the timepoints in this song are highly correlated with each other, but there is also some interesting blocky structure along the diagonal of the matrix suggesting various sections of the song. Below, I'll wrap some of the code to retrieve a model from the Spotify API so that I can use it to easily retrieve models for other songs.

Define a function to get song model

In [14]:
def get_model(uri, normalize=False):
    
    analysis = sp.audio_analysis(uri)
    
    beats = pd.DataFrame(analysis['beats'])
    beats.index = pd.to_datetime(beats['start'], unit='s')
    beats['beat'] = 1
    beats = beats.resample('1L').mean()
    beats = beats['beat'].fillna(0)
    
    bars = pd.DataFrame(analysis['bars'])
    bars.index = pd.to_datetime(bars['start'], unit='s')
    bars['bar'] = range(1,bars.shape[0]+1)
    bars = bars.resample('1L').mean()
    bars = bars['bar'].fillna(method='ffill')
    
    segments = pd.DataFrame(analysis['segments'])
    pitches = pd.DataFrame(segments['pitches'].as_matrix().tolist())
    pitches.index = pd.to_datetime(segments['start'], unit='s')
    pitches = pitches.resample('1L').mean().fillna(method='ffill')
    
    timbre = pd.DataFrame(segments['timbre'].as_matrix().tolist())
    timbre.index = pd.to_datetime(segments['start'], unit='s')
    timbre = timbre.resample('1L').mean().fillna(method='ffill')

    data = pd.concat([bars, beats, pitches, timbre], axis=1).fillna(0)
    
    if normalize:
        return pd.DataFrame(hyp.tools.normalize(data, normalize='within'), index=data.index)
    else: 
        return pd.DataFrame(data, index=data.index)

def get_models(uris, normalize=False):
    return [get_model(uri, normalize=normalize) for uri in uris]

Next, I'll take a look at a new song by the Fleet Foxes called "Third of May / Odaigahra", which is embedded below. I picked this song for three reasons: 1) I like it, 2) Fleet Foxes have a more organic feel than Daft Punk, so I'd expect the temporal structure to be different and 3) This song is comprised of 2 distinct parts (Third of May and Odaigara), so we may be able to see that structure.

In [15]:
url = "https://open.spotify.com/embed/track/2tUiuvEUs1Tw2pnfZUeIz1"
iframe = '<iframe src=' + url + ' width=700 height=350></iframe>'
IPython.display.HTML(iframe)
Out[15]:

Get model

In [16]:
track = sp.search('Fleet Foxes Third of May', limit=1)
uri = track['tracks']['items'][0]['uri']
model = get_model(uri, normalize=True)

Plot the model

In [17]:
sns.heatmap(model, yticklabels=False, xticklabels=False, vmin=-2, vmax=2)
sns.set_context('talk')
sns.plt.show()

Plot the timepoint-by-timepoint correlation matrix

In [18]:
sns.heatmap(model.resample('1S').mean().T.corr(), yticklabels=False, xticklabels=False)
sns.set_context('talk')
sns.plt.show()

There is also some neat structure in this song. Firstly, its very different than the structure of the Daft Punk song. Secondly, it seems to be split into 2 distinct parts, which map on to the 2 segments in the song. Let's now do a little smoothing to better visualize this structure.

In [19]:
sns.heatmap(model.resample('1S').mean().rolling(10).mean().fillna(0).T.corr(), yticklabels=False, xticklabels=False)
sns.set_context('talk')
sns.plt.show()

It actually appears that there are (at least) three distinct parts. Using Hypertools, I'll see if I can recover that structure with the cluster tool, which performs k-means clustering and then plot the result.

In [20]:
group = hyp.tools.cluster(model.resample('1S').mean().rolling(10).mean().fillna(0), 3)
_, _, _, _ = hyp.plot(model.resample('1S').mean(), '.', group=group, legend=['First', 'Middle', 'End'])

What is shown above is the Fleet Foxes song "First of May" plotted in 3D using dimensionality reduction (Incremental PCA). Each dot is a time sample and the colors refer to the clusters recovered by the cluster tools. We can also visualize those cluster labels by looking at the output of clusters, and see that the song has been parsed up into 3 chunks, which map on to the 3 segments we noticed in the timepoint-by-timepoint correlation matrix of this song. Cool!

In [21]:
sns.tsplot(group)
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x10d8f8b50>

Now, let's take a look at structure of the whole album. To look at the structure of how one song relates to the next, we can average the model over timepoints for each song, and then correlate the feature vector of each song with each other song.

Analyze the whole album

In [22]:
album_fleetfoxes = sp.album('spotify:album:0xtTojp4zfartyGtbFKN3v')
uris_fleetfoxes = [track['uri'] for track in album_fleetfoxes['tracks']['items']]

models_fleetfoxes = get_models(uris_fleetfoxes)
feature_vecs_fleetfoxes = np.array(map(lambda x: x.mean(axis=0).values.tolist(), models_fleetfoxes))

Song by song correlation matrix

In [23]:
sns.heatmap(pd.DataFrame(feature_vecs_fleetfoxes).T.corr())
sns.plt.show()

Based on the structure of this album, it looks like the first two songs are similar, the early middle is relatively unique and the end is pretty homogenous. We can also cluster the matrix using seaborn clustermap:

In [24]:
sns.clustermap(pd.DataFrame(feature_vecs_fleetfoxes).T.corr(), metric='correlation')
sns.plt.show()
/Users/andyheusser/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/matplotlib/cbook.py:136: MatplotlibDeprecationWarning: The axisbg attribute was deprecated in version 2.0. Use facecolor instead.
  warnings.warn(message, mplDeprecation, stacklevel=1)

Looks like there are 3 chunks to the album. Now, I'll use k-means clustering to see if we can recover that structure and then plot the songs, coloring each song by cluster and reducing the data with multi-dimensional scaling so that we can visualize in 2d.

In [25]:
track_titles = [str(idx) + ' ' + track['name'][:10] + '...' for idx, track in enumerate(album_fleetfoxes['tracks']['items'])]
_, _, _, _ = hyp.plot(feature_vecs_fleetfoxes, 'o', model='MDS', n_clusters=3, ndims=2, labels=track_titles)

Interestingly, the album is grouped into 3 chunks (first 2, middle 5 and end 4 songs) as the album seems to roughly follow an arc from beginning to end. We can play the same game with the Daft Punk Album:

In [26]:
album_daftpunk = sp.album('spotify:album:4m2880jivSbbyEGAKfITCa')
uris_daftpunk = [track['uri'] for track in album_daftpunk['tracks']['items']]
models_daftpunk = get_models(uris_daftpunk)
feature_vecs_daftpunk = np.array(map(lambda x: x.mean(axis=0).values.tolist(), models_daftpunk))
In [27]:
sns.heatmap(pd.DataFrame(feature_vecs_daftpunk).T.corr())
sns.plt.show()

sns.clustermap(pd.DataFrame(feature_vecs_daftpunk).T.corr(), metric='correlation')
sns.plt.show()

track_titles_daftpunk = [str(idx) + ' ' + track['name'][:10] + '...' for idx, track in enumerate(album_daftpunk['tracks']['items'])]
_, _, _, _ = hyp.plot(feature_vecs_daftpunk, 'o', model='MDS', ndims=2, labels=track_titles_daftpunk, n_clusters=3)

Despite being very different albums, both the Fleet Foxes and Daft Punk albums appear to separate into 3 ordered chunks. Perhaps this could be a feature of other great albums? Now, I'll plot the songs in the same space using MDS to visualize so we can compare between them.

In [36]:
_, _, _, _ = hyp.plot([feature_vecs_fleetfoxes, feature_vecs_daftpunk], 'o', 
                      model='MDS', ndims=2, legend=['Fleet Foxes', 'Daft Punk'], normalize='across')

The songs are almost perfectly separable, with the exception of one Daft Punk song...what is it? It turns out the most Fleet Foxy Daft Punk song is this:

In [29]:
url = "https://open.spotify.com/embed/track/2KHRENHQzTIQ001nlP9Gdc"
iframe = '<iframe src=' + url + ' width=700 height=350></iframe>'
IPython.display.HTML(iframe)
Out[29]:

Since these two albums separate so well, let's see if we can classify the songs based on their time-averaged feature vectors. I'll use a linear support vector machine as a classifier.

In [47]:
data = hyp.tools.normalize(np.vstack([feature_vecs_fleetfoxes, feature_vecs_daftpunk]), normalize='across')
labels = ['Fleet Foxes']*feature_vecs_fleetfoxes.shape[0]+['Daft Punk']*feature_vecs_daftpunk.shape[0]
In [48]:
clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, data, labels)
print('SVC classification accuracy: %f' % np.mean(scores))         
SVC classification accuracy: 0.790344

80% accuracy, not bad on the first pass. What about if I first reduce the dimensionality using MDS, as in the visualization above?

In [49]:
data = hyp.tools.reduce(data, model='MDS', ndims=2)
clf = svm.SVC(kernel='linear', C=1)
scores = cross_val_score(clf, data, labels)
print('SVC classification accuracy after MDS dimensionality reduction: %f' % np.mean(scores))    
SVC classification accuracy after MDS dimensionality reduction: 0.952381

Ah! Much better :)

Finally, I just wanted to point out a neat observation. Each song on the Fleet Foxes album has its own unique temporal structure, its own trajectory through feature space. It would be really neat to use the shape of these trajectories as classification features, or to find songs that are similar in trajectory, but not necessarily similar in features (beats, pitch, timbre, etc). Scroll below to see the temporal 'fingerprint' of each song, as well as its trajectory. Thank you for reading!!

In [33]:
track_titles_fleetfoxes = [str(idx) + ' ' + track['name'] for idx, 
                           track in enumerate(album_fleetfoxes['tracks']['items'])]

for idx, model in enumerate(models_fleetfoxes):
    ax = sns.heatmap(model.resample('1S').mean().T.corr(), yticklabels=False, xticklabels=False)
    ax.set_title(track_titles_fleetfoxes[idx])
    sns.plt.show()
    hyp.plot(model.resample('1S').mean(), 
             title=track_titles_fleetfoxes[idx])

Wow, thank you for reading the whole thing! :)